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@ Intelligent disk array controller background surface analysis. 

(§7) a computer system includes a method for performing background disk sector analysis for drives, 
including drives dedicated to redundancy and/or fault recovery techniques, in an Intelligent microp- 
rocessor based disk array (116). The method directs a microprocessor (20) to wai a specified time and 
test for disk activity. In the absence of disk activity, the disk controller (112) is directed to generate a 
read request for a disk memory location within the array. A return code fallowing the array is checked to 
determine if the read railed, indicating a disk drive media failure. The disk controller (112) is then 
notified if a failure occurs. The processor again checks for disk array activity and in the absence of 
activity issues a read request for successive locations within the array, thereby reading afl disk memory 
locations within the array (116). 
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The present invention relates to the control of disk drives within a computer system and more particularly 
to a method for carrying out disk drive sector analysis in a background mode in an intelligent mass storage 
disk drive array subsystem for a personal computer. 

Microprocessors and the personal computers which utilize them have become more powerful over the 
5 recent years. Currently available personal computers have capabilities easily exceeding mainframe computers 
of 20 to 30 years ago and approach capablities of many mainframe and minicomputers currently manufactured. 
Microprocessors having word sizes of 32 bits wide are now widely available, whereas in the past, 8 bits was 
conventional and 16 bits was common. 

Personal comjxiter systems h^ teveioped over the years and new uses are being discovered daly. The 
10 uses are varied and, as a result Have differed wuirem complete corn- 

puter system. Because of production volume retirements aijrid resulta^ desirable that 

asmanycbmmOT 

com puter area try developing a basic system unit which generally contains a power supply, provisions for physl- 
caily mounting the various mass storage devices and a system board, which in turn inborpora a microproces- 

15 sor, microprocessor related circuitry, connectors for receiving c^ 

cultry related to Interfacing the circuit boards to the microprocessor and memory. The use of connectora^ 
interchangeable circuit boards allows subsystems of the desired facility for eactv cot 
easily incorporated into the computer system. The use of interchangeable cirxxiit boards ^ re 
development of an interface or bus standard so that the subsystems could be easily designed arid problems 

20 would not result from incompatible decisbra by the system unft and the interchangeable tirciitt board 

designers. . • 

llie use of interchangeable circuit b^ 
because the various signals are provided to all the connectors over a bus, was incoipoiated into the original 
Infematbnal Business Machines Corporations (IBM) personal computer, the IBM F?Q-; Trie I6M pGjutilize^ in 

25 Intel Corporation 8088^^ The 8088 haamttijft 

ates on a 18 bit word Internally. The 8088 has 20 address llnea, ^Ich ri» a 
maxvnum of 1 Mbyte df memory. In addition, the 

IBM PC were relatively slow and expensive as compared to current components. The various subsystems such 
as video output units or mass storage units, were not complex and also had relatively tow performance levels 

30 because of trie relative simplicity of the devices available at a reasonable cc^ 

With these various factors and component choices in mind, an interface standard was developed and used 
in the IBM PC. The standard utilized 20 address lines and 8 data lines. Individual lines to Indicate Input or output 
(I/O) or memory space read/write operations, and had limited avaiabflity of interrupts and direct memory access 
(DMA) channels. The complexity of the available components did riot re^ 

35 of the interface standard to allow the necessary operations to occur, this interface standard was satisfactory 
tor a number of years. 

As is inevitable in the computer and electronics industry, capabilities of the various components available 
increased dramatically. Memory component prices dropped in capacities and speeds increased. The perform- 
ance rate and capacities of the mass storage subsystems Increased , generally by the Incorporation of hard disk 

40 units for previous floppy disk units. The video processor technology improved so that high resolution color sys- 
tems were reasonably affordable. These developments all pushed the capabilities of the existing IBM PC inter- 
face standard so that the numerous limitations In the Interface standard became a problem. With the Introduction 
by Intel Corporation of 80288, IBM developed a new, more powerful personal computer called the AT. The 
80288 has a 1 6 bi data path and 24 address lines so that it can directly address 1 8 Mbytes of memory, in addi- 

45 tion, the 80286 has an increased speed of operation andean easily perform many operations which taxed 8088 
performance limits. 

It was desirable that the existing subsystem circuit boards be capable of being used in the new AT, so the 
interface standard used in the PC was utilized and extended. A new Interface standard was developed, which 
has become known as the industry standard architecture (ISA). A second connector for each location was 

so added to contain additional lines for the signals used in the extension. These lines included additional address 
and data lines to allow the use of the 24 bit addressing capability and 18 bit data transfers, additional interrupt 
and direct memory access lines and lines to Indicate whether the subsystems cfrcult board was capable of using 
the extended features. While the address values are presented by the 80288 microprocessor relatively early 
in the operation cycle, the PC interface standard could not utilize the initial portions of the address availablity 

65 because of different timing standards for the 8088 around which the PC Interface was designed. This limited 
the speed at which operations could occur because they were now limited to the Interface standard memory 
timing specifications and could not operate at the rate available with the 8026a Therefore, the newly added 
address signals previous available, but the newly added signals were available at an early time in the cycle. 
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This change in the address single timing allowed operations which utilized the extended portions of the architeo 
bire to operate faster. With a higher performance components available it became passible to have a master 
unit other than the system microprocessor or direct memory access controller operating the bus. However, 
because of the need to cooperate with circuit boards which operated under the new 16 bit standard or the old 

5 8 bit standard, each master unit was required to understand and operate with all the possible combinations of 
circuit boards. This increased the complexity of the master unit and resulted in a duplication of components, 
because the master unit had to incorporate many of the functions and features already performed by the logic 
and circuitry on the system board and other master units. Additionally, the master unit was required to utilize 
the direct memory access controller to gain control of the bus, limiting prioritizing and the number of master 

10 units possible in a given computer system. The capabilities of components continued to increase. Memory 
speeds and sizes increased, mass storage units and size increased, video unit resolutions increased and Intel 
Corporation introduced the 80386 microprocessor. The increased capabilities of the components created a des- 
ire for the use of master units, but the performance of a master unit Was limited by the ISA specification and 
capabilities, the 80386 could hot be fully utilized because it offered the capability directly address 4 Gbytes of 

15 memory using 32 bits of address and could perform 32 bit wide data transfers, while me ISA standard allowed 
only 16 bits of data and 24 bits of address. The local area network (LAN), concept, where Information and file 
stored on one computer called server and distributed to local work stations having limited or no mass storage 
capabilities, started becoming practical with the relatively low cost of high capability components needed for 
adequate servers and the low costs of th e components for work stations. An extension similar to that performed 

20 in developing the ISA could be implemented to utilize the 80386*s capabilities. However, mis type of extension 
would have certain disadvantages. With the advent of the LAN concept and the high performance requirements 
of the aerverand of video graphics work stations used in computer-aided design and animation work; the need 
for a very high data transfer rates became critical. An extension similar to that performed in developing the ISA 
would not provide this capability, even if slightly shorter standard cycle tirnes were provided, because this would 

25 still leave the performance below desired levels. 

With the Increased performance of computer systems, It became apparent that mass storage subsystems, 
such as fixed disk drives, played an increasingly important role in the transfer of date to and from the computer 
system. In the past few years, a new trend in mass storage subsystems has emerged for Improving data transfer 
performance, capacity and reliability. This Is generaBy known as a disk array subsystem. A number of reference 

30 articles on the design of disk arrays have been published in recent years, These include "Considerations in 
the Design of a RAID Prototype" by M. Schulze. Report Na UCB/CSD 88/448, August, 1 868, Computer Science 
Division, University of California, Berkeley: "Coding Techniques for Handling Failures In Large Disk Arrays" by 
6. Gibson, L Helierstein. R. Karp. R. Katz and D. Patterson, Report No. UCB/CSD 88/477, pecember 1988, 
Computer Science Division, University of California Berkeley; and "A Case Study for Redundant Arrays of 

35 Inexpensive Disks (RAID)" by D. Patterson. G. Gibson, and R. Katz, presented at the June 1988 ACM SIGMOD 
Conference in Chicago, Illinois. 

One reason for wanting to build a disk array subsystem is to create a logical device that has a very high 
data transferrate. This may be accomplished by ganging multiple standard disk drives together and transferring 
data to or from these drives to the computer system memory. If n drives are ganged together, then the effective 

40 data transfer rate is increased n times. This technique, called striping, originated in the super computing envi- 
ronment where the transfer of large amounts of data to and from secondary storage is a frequent requirement 
With this approach, multiple physical drives may be addressed as a single logical device and may be Implemen- 
ted either through software or hardware. Disk array subsystems may also be configured to provide data redun- 
dancy and/or data recovery capability. 

45 Two data redundancy and recovery techniques have generally been used to restore data in the event of 
a catastrophic drive failure. One technique Is the mirrored drive. A mirrored drive in effect creates a redundant 
data drive for each data drive. A write to a disk array utilizing the mirrored drive fault tolerance technique will 
result in a write to the primary data disk and a write to its mirror drive. This technique results in a minimum loss 
of performance in the disk array. However, there exist certain disadvantages to the use of mirrored drive fault 

50 tolerance techniques. The primary disadvantage is that this technique uses 50% of total data storage available 
for redundancy purposes. This results in a relatively high cost of storage per available byte. 

Another technique is the use of a parity scheme which reads data blocks being wrttten to various drives 
within the array and uses a known exclusive or (XOR) technique to create parity information which is written 
to a reserved or parity drive in the array. The advantage to this technique is that it may be used to minimize 

ss the amount of data storage dedicated to data redundancy and recovery. In an 8 drive array, the parity technique 
would call for one drive to be used for parity information; 12.5% of total storage is dedicated to redundancy as 
compared to 50% using the mirrored drive fault tolerance technique. The use of parity drive techniques dec- 
reases the cost of data storage. However, there are a number of disadvantages to the use of parity fault toler- 
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a nee techniques. The primaiy among them is the loss of performance within the disk array as the parity drive 
must be updated each time a data drive is updated. The data must undergo the XOR process in order to write 
to the parity drive as well as writing the data to the data drives. 

A second disadvantage is the use of the system processor to perform XOR parity information generation. 

5 This requires that the drive data be transferred from the drives, to a transfer buffer, to the system processor 
tocai memory to create the XOR parity information and that the data be sent back to the drives via the transfer 
buffer. As a result, the host system processor encounters a significant overhead in managing the generation 
of the XOR parity data. The use of a local processor within the disk array controller also encounters many of 
the same problems that a system processor would. The drive data must again go from the drives to a transfer 

10 buffer to local processor memory to generate XOR parity information and then back to the parity drive via the 
transfer buffer. 

Related to this field of data error correction is U.S. Patent No. 4,665.978 for a data error correction system. 
Implementation of either mirror or parity fault tolerance techniques In a disk array significantly Improves 
data redundancy arid recovery capability within a disk array. However, conditions may arise which may lead 

15 to data loss within a disk array despite the implementation of mirror or parity fault tolerance techniques. These 
conditions may arise when disk sectors on a parity or mirror disk fafl, followed by faiure of corresponding disk 
sectors on data disks. In such instances, the data can not be regenerated as eitherthe minor or parity disk has 
already failed and the failure not generally detected or corrected. 

For example, a database application may include a number of data records which are written initially to a 

20 disk array and are read multiple times without the records being updated. During the initial write operation, the 
data written to the data drives is used to generate XOR parity information. The data is written to data drive sec- 
tors on data disks within the array and the corresponding XOR parity information is written to corresponding 
sectors on the parity drive. When a data drive feDs, the data may be regenerated using the XOR parity infor- 
mation from the parity drive and the remaining date drives. While catastrophic drive failure may o«ur. die more 

2s common mode of failure for a hard disk is forone or more disk sectors on a disk te* to become oam or 
unreadable. Disk sector data may also be regenerated utilizing the remaining data disks and XOR parity disk 
The corresponding sectors of the remaining date disks and the parity disk may be read to regenerate the lost 
disk sector data. The regenerated data may then be written to a replacement disk as part of a total data disk 
regeneration process or the regenerated data may be remapped to new sectors on the existing disk. 

30 In this environment, the disk controller generally wfll not access the parity disk unless a write to the corre- 

sponding data <*ive sectors occurs or unless the parity sectors must be read to generate date for one or more 
of the date drives. However, the parity disk Is just as susceptible to disk feflure as the data drives within the 
array. During the course of operations, sectors on the parity drive may become corrupted or foO. This failure 
would not normally be detected as the parity disk is not being accessed by the disk controller. Thus, the XOR 

35 parity information corresponding to sectors on the data disks may be corrupted or lost without the disk controller 
or the user being aware of the parity disk sector faiure. Should a corresponding data drive sector subsequently 
become corrupted or otherwise fafl, the disk array wiD be unable to regenerate the data despite the use of parity 
fault tolerance techniques. ^ 

A similar situation may occur where mirrored drive fault tolerance techniques are utltzed In a disk array. 

40 As in the parity fault tolerance situation. Instances will arise when a record is initially written to a data drive and 
is read many times without the data record being updated. The initial write request will generate a write com- 
mand to the data disk and Its corresponding mirror disk. Subsequent read requests will be directed by the disk 
controllerto sectors on the data disk without reading the corresponding sectors on the mirror disk. The controller 
will not access the mirror disk unless a write command is issued to the date disk or the mirror disk is read to 

45 regenerate data in the event of a data disk failure. 

Over time sectors on the mirror disk may become corrupted or fafl. This failure wfll not be detected by the 
disk array controller or the user as the read requests wil be directed to the data disk and not the mirror disk 
Thus, the drive controller and user are unaware of the mirror diskfailure. The disk array will be unable to regen- 
erate data should data disk sectors subsequently become corrupted or fafl if their corresponding mirror drive 

30 sectors have already become corrupted or faled. 

This situation wil not occur when the mirrored drive is managed using duplexed disk controllers. In a duplex 
environment, separate controllers are used to manage the data and mirrored drives. The Initial write request 
will result in a write command being generated by each of the controllers to their respective drives. Subsequent 
read requests are also sent to both drive controllers. The operating system will accept the data from eitherthe 

55 data or mirror disk controller based upon which controller completes the read request first Thus, though one 
disk may nominally be designated as the mirror disk, it may art as the primary data disk in many instances. 
Additionally, if a sector on either drive goes bad. the error wil be provided to the operating system because 
each controller must respond to the request Thus, even if the data is used from the other drive, the error is 
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reported. 

Thus, it wBi be appreciated that data loss may stOI occur even when parity or mirror fault tolerance techni- 
ques are used. 

The present invention is tor use with a personal computer having a fault tolerant, intelligent disk array oon- 

5 toiler system; the controller being capable of managing the operation of an array of up to a standard integrated 
disk drives connected in drive paris without supervision by the computer system host Specifically, the present 
invention is directed toward a method tor performing disk surface analysis by generating read requests for all 
disks within an array, including parity and mirror disks, as a background task to verity disk sector integrity. 
The method of the present invention is Intended for use in an intelligent disk array subsystem oontroller 

10 having a local microprocessor operating fn a multi-tasking environment The method of trie present invention 
initiates a background task after a predetermined time in which no disk activity takes place. The background 
task creates a series of read requests which are processed by the disk controller which will cause all d isk sectors 
within the array, Including any disks allocated to fault tolerance and data redundancy, to be read and therefore 
checked for sector integrity. The individual read requests are scheduled by the local processor arid return a 

15 success or failure code indicating the integrity of the information in the sector read. The method includes a 
means for notifying the host processor of a read failure code so that corrective action may be taken. The present 
invention will continue to generate read requests in a background mode untfl all sectors within the disk array 
have been read and verified. Upon completion of reading ail sectors, the present invention will proceed to begin 
the read verification for the entire array again. The present invention does not adversely affect disk array sub- 

20 system performance as a result of this read verification process as the background task wiD go inactive when 
it detects that the disk array subsystem is actively performing read or write operations. The background task 
will become active after waiting a predetermined time and test to determine If the disk array is still performing 
read or write operations. If still busy, the background task will again go inactive for the predetermined period. 
If the disk array is inactive, the task will resume issuing read requests for the disk array. The present invention 

25 will continue to issue read requests until all disk memory locations within the array have been verified. Following 
completion of this task, the present Invention will again begin Issuing read request for the disk array, repeating 
the above process. 

A better understanding of the invention can be had when the following detailed description of the preferred 
embodiment Is considered In conjunction with the following drawings. In which: 
30 Figures 1 and 2 are a schematic block diagrams of a oomputer system in which the present invention may 

be practiced: 

Figure 3 isaschematicblockdiagramof an intelligent disk array controller environment in which the present 
invention may be practiced; 

Figure 4 is a flow diagram of the manner in which the present invention resumes the generation of read 
35 requests d vected toward the disk array following a reset condition on the computer system upon which the 
present invention Is being practiced; 

Figures 5A - 5C are flow diagrams of the manner in which the present invention generates read requests 
for all sectors within a disk array; 

Figure 6 Is a flow diagram of the manner In which the present Invention Initialize disk array parameters prior 
40 to generating read requests directed to the disk array; 

Figures 7A and 7B are portions of a task for scheduling logical read and write requests pertinent to the 
present Invention; and 

Figure S is a flow diagram of the manner in which the present invention activates the generation of read 
requests after a specified time of disk array inactivity; 
45 Figure 9 is a flow diagram of the manner in which the present invention determines if the disk array system 
is currently performing read or write operations or is otherwise busy. 
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Detailed Description of the Pr^ f rr^ vm^Ai™*** 

Table Of Content* 
s X* computer System Overview 

XI, Disk Array Controller 

XXI. Background Surface Analysis 

A. Restart Surface Analysis 
10 B. Surface Analysis Task 

C. Initialisation of Surface Analysis 

D. Scheduler Operations 
1S Timer Function 

E- Busy Function 
XV. Conclusion 

20 I. Computer System Overview 

Referring now to Figures 1 and Z the fetter C designates generally a computer system upon which the cur- 
rent Invention may be practiced. For clarity, system C is shown in two portions, with the interconnections be- 
tween Figures 1 and 2 designated by reference to the circled numbers one to eight System C is comprised of 

25 a number of block elements interconnected via four buses. 

In Figure 1 , a computer system C Is depicted. A central processing unit (CPU) comprises a processor 20, 
a numerical coprocessor 22 and a cache memory controller 24 and associated logic circuits connected to a 
local processor bus 26. Associated with cache controller 24 is high speed cache data random access memory 
(RAM) 28. noncacheable memory address map programming logic circuitry 30. noncacheable address memory 

30 32, address exchange latch circuitry 34 and data exchange transceiver 36. Associated wfth the CPU also are 
local bus ready address enable logic circuit 40 a logic circuit 36, next address enable logic 40 and bus request 
logic circuit 42. 

The processor 20 is preferably an Intel 80366 microprocessor. The processor 20 has its control, address 
and data lines interfaced to the local processor bus 26. The coprocessor 22 is preferably an Intel 60387 and/or 

as Weitek WTL 31 67 numeric coprocessor interfacing with the local processor bus 26 and the processor 20 in the 
conventional manner. The cache RAM 28 is preferably suitable high-speed static random access memory which 
interfaces with the address and data elements of bus 96 under control of the cache controller 24 to carry out 
required cache memory operations. The cache controller 24 is preferably an Intel 82385 cache controller con- 
figured to operate In two-way set associative master mode. In Figs. 1 and 2. the components are the 33 MHZ 

40 versions of the respective units. Address latch drcuitry 34 and data transceiver 36 interface the cache controller 
24 with the processor 20 and provide a local bus interface between the local processor bus 26 and a host bus 
44. Circuit 38 is a logic circu it which provides a bus ready signal to control access to the local bus 26 and Indicate 
when the next cycle can begin. The enable circuit 40 is utilized to indicate that the next address of date or code 
to be utilized by subsystem elements in pipelined address mode can be placed on the local bus 26. 

45 Noncacheable memory address map programmer 30 cooperates with the processor 20 and the noncache- 
able address memory 34 to map noncacheable memory locations. The noncacheable address memory 32 is 
utilized to designate areas of system memory that are noncacheable to avoid many types of cache memory 
in coherency. The bus request logic circuit 42 is utilized by the processor 20 and associated elements to request 
access to the host bus 44 In situations such as when requested data is not located in die cache memory 28 

so and access to system memory is required. 

In the drawings, system C is configured having the processor bus 26. the host bus 44. an extended industry 
standard archfcecture (EISA) bus 46 (Fig. 2) and an X bus 90. The details the portion of the system fflustrated 
in Figure 2 and not discussed in detail below are not significant to the present invention other than to Illustrate 
an example of a fully configured computer system. The portion of system C llustrated in Fig. 2 is essentially a 

55 configured EISA system which Includes the necessary EISA bus 46, and EISA bus controller 46. data latches 
and transceivers 50 and address latches and buffers 52 to interface between the EISA bus 46 and the host 
bus 44. Also illustrated in Figure 2 Is an integrated system peripheral 54, which incorporates a number of the 
elements used in an EISA-based computer system. 
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The integrated system peripheral (ISP) 54 includes a direct memory access controller 56 for controlling 
access to main memory 58 (Fig. 1) or memory contained in EISA slots and input/output (I/O) locations without 
the need for access to the processor 20. The main memory array 58 Is considered to be local memory and 
comprises a memory circuit array of size suitable to accommodate the particular requirements of the system. 
5 The ISP 54 also includes interrupt controllers 70. nonmaskable interrupt logic 72 and system timers 74 which 
allow control of interrupt signals and generate necessary timing signals and wait states in a manner according 
to the EISA specification and conventional practice. In the preferred embodiment, processor generated interrupt 
requests are controlled via dual interrupt control circuits emulating and extending conventional Intel 8259 inter- 
rupt controllers. The ISP 54 also includes bus arbitration logic 75 which, in cooperation with the bus controller 
10 48, controls and arbitrates among the various requests for the EISA bus 48 by the cache controller 24, the DMA 
controller 58 and bus master devices located on the EISA bus 48 

The main memory array 58 is preferably dynamic random access memory. Memory 58 interfaces with the 
host bus 44 via a data buffer circuit 80, a memory controller circuit 82 and a memory mapper 88. The buffer 
60 performs data transceiving and parity generating and checking functions, the memory controller 62 and 
15 memory mapper 68 interface with the memory 53 via address multiplexer and column address strobe buffers 
66 and row address enable logic circuit 64. 

The EISA bus 46 includes ISA and EISA control buses 78 and 78, ISA and EISA control buses 80 and 82, 
and address buses 64, 86 and 88l System peripherals are interfaced via the X bus 90 in combination with the 
ISA control bus 76 from the EISA bus 46. Control and data/address transfer for the X bus 90 are facilitated by 
20 X bus control logic 92, data transceivers 94 and address latches 96. 

Attached to the X bus 90 are various peripheral devices such as keyboard/mouse controller 98 which inter- 
faces the X bus 90 with a suitable keyboard and mouse via connectors 100 and 102, respectively. Also attached 
to the X bus 90 are read only memory (ROM) circuits 1 06 which contain basic operations software for the system 
C and for system video operations. A serial communications port 108 is also connected to the system C via 
25 the X bus 90. Floppy and fixed disk support a parallel port a second serial port and video support circuits are 
provided In block circuit 110 connected to theX bus 90. 

II. Disk controler 

30 The disk array controler 1 12 is connected to the EISA bus 48 to provide for the communication of data 

and address information through the EISA bus. Fixed disk connectors 1 14 are connected to the fixed disk sup- 
port system and are in turn connected to a fixed disk array 1 16. A schematic block diagram of the disk array 
controller 1 12 upon which the present invention may be practiced is shown if Figure 3. It is understood that the 
disk controller set forth in Fig. 3 is for the purpose of illustrating the environment in which present invention 

35 may operate. The method of the present invention may be practiced on any personal computer system disk 
array having a microprocessor based, intelligent array controller, and Indeed can be performed by the system 
microprocessor in conjunction with more conventional board disk control systems. 

The disk array controller 112 includes a bus master interface controller 118 (BMIC), preferably an Intel Cor- 
poration 82355, which is designed for use In a 32 bit EISA bus master expansion board and provides all EISA 

40 control, address, and data signals necessary for transfers across the EISA bus. The BMIC 1 18 supports 16 
and 32 bit burst transfers between the disk array system and system memory. Additionally, BMIC 118 provides 
for the transfers of varying data sizes between an expansion board and EISA and ISA devices. 

The disk array controller 11 2 also includes a compatibility port controller (CPC) 120. The CPC 120 is desig- 
ned as a communication mechanism between the EISA bus 46 and existing host driver software not designed 

45 to take advantage of EISA capabilities. 

Also included in the diskarray controller 112 is a microprocessor 122, preferably an Intel Corporation 801 86 
microprocessor. The local processor 122 has its control, address and data lines interfaced to the BMIC 118, 
CPC 120, and a transfer channel controller 1 24. Further, the local processor 1 22 is also interfaced to local read 
only memory (ROM) 126 and dynamic random access memory (RAM) 128 located within the disk array con- 

so trailer 1 12. 

The transfer channel controller (TCC) 124 controls the operation of four major DMA channels that access 
a static RAM transfer buffer 1 30 which Is used to store data transferred by the disk system. The TCC 124 assigns 
DMA channels to the BMIC 118, the CPC 120 the local processor 122 and to the disk array DMA channel 114. 
The TCC 124 receives requests from the four channels and assigns each channel a priority level. The local 
66 processor 122 has the highest priority level. The CPC 120 channel has the second highest priority level. The 
BMIC 1 1 8 channel has the th id highest priority level and the disk array DMA channel 1 14 has the lowest priority 
level. 

The diskarray DMA channel 1 14 is comprised of four disk drive subchannels. The four disk drive subchan- 

8 



EP0 467 708 A2 



10 



15 



20 



25 



30 



35 



40 



... . - _ firm v The four drive sub- 

the source for the disk array DMA channel. °^™^ ea ^ ^ ^ rotation. r 
DMArequest The remaining three subchannela ^^^^ slib mitted to the disk array confroUer 

which the present invention may be practiced may oe na , 

Back ground Surface Analysis series of read 

The present invention Is a method which 

matton is present control of processor ^ t ^™™^, P N , t iAUZE task 600 the processor 122 determines 
?00 (Fig 6) and sets the RESET code to TRUE. In the ini , ndtvldual disk parameters, for 

to determine if thedlskarray lM ^SSSSoSm«m present Invention. ad ^ d ^^ n T te ^ 
te leas than the initial release verston ^^^^^Nl RIS and the version information is updated 
be used with the TIMER task T^M^^etay^aybeseiectively set by the u ^ r ^ oa ^ l lJ'^ xhlste 

to at least the initial release version. The TWER delay may n0W TIMER delay period. This Is 

nr^or 20 to issue a command which updates the GLOBAL , ^ preferred ernbodirnent *e 

initial release version, control transrers to 
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value is set to the default value. Control is then transferred to step 662. if it is determined in step 658 that the 
current RIS version is not less than the initial release version, control transfers to step 662 wherein processor 
122 determines if the number of fault tolerant drives within the array, as determined by the INITIALIZE task 
600, is greater than zero. If not greater than zero, control of processor 122 transfers to step 664 where the task 
is stopped. If greater than zero, control transfers to step 668 which initiates the one-shot timer which is utilized 
in the present invention for delay purposes prior to starting or restarting the actual surface analysis. Control 
then transfers to step 666 which ends execution of the task. 

B. SURFACE_ANALYSIS 

Figs. 5A- 5C areflowdiagrams of the SURFACE_ANALYSIS task 500 within the present invention running 
on the local processor 122 which generates read requests for all disk sectors wfthin the disk array. Operation 
begins at step 502, wherein processor 122 determines whether the variable N EXTRACTION to be carried out 
is the issue of a background read request NEXT_ACTION is inftialized to issue a read request If the 
NEXT_ACT10N value is the issue of a read request control transfers to step 504 where the number of pending 
logical read or write requests Is stored to a local variable within the oontrollermemory 128. Control then transfers 
to step 506. If in step 502 it is determined that the next action to be carried out by the current task Is not an 
issue read request control transfers to step 506. In step 508 the processor 122 calls the INITIALIZE task 600. 
When the INITIALIZE task 600 is called in step 506. a code indicating a NON-RESET condition Is sent to the 
task 600. Upon return from the INITIALIZE task 600, control transfers to step 508 wherein processor 122 checks 
for the number of fault tolerant drives within the array, which is determined by the INITIALIZE task 600 called 
in step 506. If the number of drives is equal to zero/ control transfers to step 510 wherein the processor 122 
halts operation of the current task. If the number of fault tolerant drives within the array is non-zero, control 
transfers to step 512 which is a switch based upon the next action to be taken. 

If the NEXT_ACTiON variable is set to issue a read request control transfers to step 514. Control then 
transfers to step 516 wherein the processor 122 sets the drive to be checked to the arrant drive being addres- 
sed by the task 500. Control transfers to step 518 which determines whether the current drive is a bad drive 
within the array, based upon parameters returned from the INITIALIZE task 600 called in step 506. If the current 
drive Is a bad drive, control transfers to step 5320 wherein the processor 122 Increments the current drive 
pointer to the next drive in the array. Control then transfers to step 522. If it is determined in step 51 6 that the 
current drive is not faulty or bad, control transfers to step 522. In step B» a the local processor 122 allocates 
RAM memory 128 for a transfer buffer to return the data and condition codes resulting from the read request 
Control transfers to step 524 wherein the processor 1 22 tests whether there exists sufficient RAM memory 126 
to allocate for the read request transfer buffer. If there is not sufficient memory 126, control transfers to step 
526 where execution pauses until the TIMER task 680 reactivates or wakes up the 6URFACE_/vNALYSIS task 
500. After the TIMER task 680 restarts the SURFACE_^ANALYSIS task 500, control transfers to step 528. The 
lack of sufficient memory 126 will generally occurwhen the disk controller 1 12 is attempting to perform a number 
of I/O transfers. Thus, the lack of memory 128 will occurwhen the array controller 1 12is busy, tf it is determined 
In step 624 that there Is sufficient memory 128 for the transfer buffer associated with the read request control 
transfers to step 628 wherein the processor 122 generates a logical read request for the current drive, head 
and cylinder. In a hard disk, a track references a radial position on a single rotating disk media. The track itself 
Is composed of multiple disk sectors which describe a radial portion of the track. The sectors per track Infor- 
mation Is stored within the GLOBAL RIS information on each disk within the disk may. Most hard disk media 
are composed of multiple coaxial disks. Thus, the same track position on each of the disks is referenced as a 
cylinder. When a drive request is issued for a specific head, cylinder and disk, the disk and cylinder information 
are explicitly specified and the track information is specified by the head associated with the particular disk In 
the present invention, a read request is intended to read all sectors on a particular track. 

Control transfers to step 630 in which the SURFAC£_ANALYSIS task 500 queues the current read request 
to be acted upon by the controller 1 12. In the preferred embodiment, a better understanding of the method of 
generation of the read request and the scheduling of the request may be had with reference to previously refer- 
enced Application Ser. No. 431,737. Control of the processor 122 is transferred to step 532 wherein the 
NEXTACTION variable Is set to check the results of the read request and execution is paused until the read 
request is completed. After the read request is completed execution recommences and control then transfers 
to step 512. 

If In step 512. It Is determined that the next action to be taken by the task 500 Is to check the results of the 
read request, control transfers to step 534 and to step 538. In step 536, the processor 122 reads the transfer 
buffer to determine if an error code indicating disk sector failure has been returned by the read request If a 
read request has an error code returned, control transfers to step 538. An error code will be returned If any one 
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of the sectors within the track specified In the read request is faulty. In step 538 the task 500 notifies the con- 
troller of the disk failure. The faflure notification may be acted upon locally by the disk controller 112 or maybe 
passed on to the host system via BMIC 1 18 to perm* the operator or operating system to take any corrective 
action necessary. One means of corrective action which may be taken locally is the automatic remapping of a 
faulty sector to a new sector on the same disk. This method of automatically remapping a faulty disk sector is 
described In co-pending U.S. Application Serial No. 656.340 to Richard Ewert et al, for AUTOMATIC HARD 
DISK BAD SECTOR REMAPPING, filed July 20, 1S90 and assigned to Compaq Computer Corporation. How- 
ever it is understood that other methods of corrective action, such as manual Intervention and regeneration of 
the data contained within the faulty disk sector may be taken. Control of the processor 122 transfers to step 
540 If its is determined in step 536 that the completed read request did not return an error code, control transfers 
to step 540 in which the processor 122 removes this current read request from the disk controller 112 queue. 
This is necessary to permit the memory 128 which hBs been allocated to the read request to be deallocated. 
If the read request Is not degueued, the local memory 128 could not be deallocated and would rapidly be fifled 
with completed read requests, as well a new requests and data to be transferred. Control transfers to step 542 
wherein the processor 122 deallocates the memory 128 which was allocated for return information from the 
drive request Control transfers to step 544 wherein processor 122 sets the NEXT_ACTION variable to Issue 

a new read request . . . 

In order to read all disk memory locations. It is necessary that the present invention increment to the next 
applicable head, cylinder and drive as required to sequentially read all disk sectors. Control transfers to step 
548 wherein processor 122 determines whether incrementing to the next head will exceed the maximum num- 
ber of heads for the drive currently being checked. If it Is determined that the maximum number of heads for 
the present drive will not be exceeded, control transfers to step 548. which increments to the next head for the 
current drive. Control then transfers to step 564. . 

If it is determined in step 548 that the current head is the last valid head designated for the present drive, 
control transfers to step 550. which sets the current head pointer to the first head for a drive. Control transfers 
to step 552 wherein the processor 122 determines If the current cylinder Is at the last cylinder for the drive. If 
not on the last cylinder, control transfers to step 554 which increments the current cylinder to the next cylinder 
on the current drive. Control then transfers to step 584. If it is determined in step 552 that the current cylinder 
is the last valid cylinder for the current drive, control transfers to step 556 which seta the current cylinder equal 
to the first cylinder for a drive. Control then transfers to step 558 which determines whether the current drive 
is the last valid drive In the array. If the last valid drive in the array, control transfers to step 560 which sets the 
current drive equal to the first drive in the array. If it Is determined in step 568 that the current drive is not the 
last drive in the array, control transfers to step 562 which increments to the next drive in the array. 

Thus the SURFACE_ANAI_YSIS task 500 will progressively issue read requests for ai heads for a cylinder 
for a drive, all cylinders for a drive and then for all drives in an array. Control transfers to step 564 which calls 
the BUSY task 700 (Fig. 0) to determine if there are pending disk I/O tasks. Control transfers to step 568 which 
determines whether the BUSY task 700 returns a TRUE code. If a TRUE code is returned, control transfers to 
step 570 where execution pauses until reactivated by the TIMER task 680. which waits a predetermined time 
afterall drive requests are completed before reactivating theSURFACE.ANALYSIS task 500. Alter reactivation 
control returns to step 512. If it is determined in step 568 that the BUSY code is FALSE, control transfers to 
step 568 wherein the processor determines whether the local variable in which the number of pending I/O 
requests Is equal tothe number of currently pending I/O requests. If equal, no new pending I/O tasks have been 
received and control transfers to step 512. If not equal, the SURFACE_ANALYSIS tesk 500 will conclude that 
at least one new tesk has been received by the disk array controller 1 12 which has not been acted upon and 
control transfers to step 570 wherein the SURFACE_ANALYSIS task 500 goes inactive until reactivated by the 

^ M Thus W me^URFACE^ANALYSIS task 500 will continue to loop and sequentially check all sectors within 
the disk array in a background mode. If the disk array is actively processing I/O requests, the SURFACE_ANA- 
LYSIS task 500 wBI go into an inactive state for a predetermined period of time. 

C. INITIALIZE 

Fki 6 is a flow diagram of the INITIALIZE task 600 which is used to set the initial parameters for the gen- 
eration of the read requests by the SURFACE_ANALYSIS task 600. Operation of the INrr^UZE task 600 
begins at step 602 wherein the processor 122 determines whether Information which describes the disk airay. 
including parameters such as heads, cylinders, sectors, sectors per track and other information^ Jfor all disks 
within the array is present This is known as the GLOBAL RIS (reserved information sectors) for the reserved 
disk sectors on the disk where such information is stored and is initialized upon computer system setup. If not 
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present, control transfers to step 604 and operation of the task ceases. If the GLOBAL RIS is present, control 
transfers to step 608 where the processor 122 determines whether a RESET code, which is set by RESTART 
task 660. Is present If the RESET code Is present control transfers to step 608, which sets all parameters to 
the first head, first cylinder and first disk within the array. Control then transfers to step 610. If it is determined 

5 in step 608 that no RESET code is set, such as when the INITIALIZE task 600 is called by SURFACE_ANA- 
LYSIS task 500, control transfers to step 610 where the processor 122 initializes a disk counter. Control trans- 
fers to step 612 which sets the current logical volume to the volume of which the ctrtent disk is a member. 
Control transfers to step 614. 

In step 614, the local processor 122 determines whether the current unit is a valid logical unit for the array 

10 and whether the current unit is part of a logical unit which includes a fault tolerance technique. It should be 
noted that a logical unit identifies not only the fault tolerant disk associated with the unit, but the data disks as 
well. If true, control transfers to step 616 wherein processor 1 22 initializes all variables associated with a disk 
request, such as number of reads, cylinders, disks/etc. to the disk parameters stored within the GLOBAL RIS. 
Control transfers to step 61 8 wherein the processor 122 tests whether the current cylinder is set to cylinder 

15 zero, which would indicate a drive which has never been used, or whether the RESET code is TRUE- If either 
condition Is true, control transfers to step 620 which sets the current cylinder equal to cylinder 1. Con troi there- 
after transfers to step 622. If both conditions are false in step 618, control of processor 122 transfers to step 
622 wherein the valid fault tolerant drive count is incremented. Control then transfers to step 626. If In step 814, 
it Is determined that the current unit is not a valid logical unit or that the current unit is not set for fault tolerant 

20 operation, control transfers to step 624, in which the drive parameters are set to indicate that the current drive 
is also faulty as a result of the current logical unit being faulty. Control transfers to step 626. In step 626, the 
local processor 122 determines whether there are additional drives within the array. If there are additional 
drives, control of the processor 122 transfers to step 630 which increments to the next drive and control then 
is transferred to step 614. If there are no further drives In the array, control transfers to step 828 which returns 

25 to the calling task. 

D. SCHEDULE OPERATIONS 

Figures 7A and 7B are portions of a procedure used to process logical requests sent to the disk array. These 

30 portions are used to manage the operation of the TIMER task 880 in the present invention. In the preferred 
embodiment, these procedures are an integral part of the scheduling task for the disk array running on local 
processor 122. However, it is understood that these tasks could be implemented as independent functions for 
the management of timing events in the present invention. 

In Figure 7A, at step 800 the local processor 122 receives a logical request Control transfers to step 802 

35 wherein the local processor 122 determines whether the current logical request count is equal to zero and 
whether the TIMER as initialized in the RESTART task 650 or elsewhere is running. If both conditions are true, 
control transfers to step 802, wherein the local processor 1 22 stops and dears the one-shot timer task operating 
under the control of the AMX operating system. Control is then transferred to step 806 which increments the 
logical request count Thus, when the local processor 122 receives a logical request, the one-shot timer task 

40 is halted and the present invention goes inactive. Control then proceeds to other processing steps necessary 
when a request has been received. 

In Figure 7B, the section of code represents the completion of a logical request In stop 810, the local pro- 
cessor 122 decrements the logical request count upon completion of the logical request Control transfers to 
step 812 wherein the local processor 122 determines whether the logical request count is equal to zero and 

45 the current logical unit is set for either mirror or parity fault tolerance node and whether the one-shot timer task 
has then stopped . if ail of the above conditions are true, control transfers to step 81 4 wherein the local processor 
1 22 resets the one-shot timer task to the time indicated in the GLOBAL RIS and starts the timer running. When 
the one-shot task completes, the TIMER task 680 is called. If another request is received prior to the one-shot 
task completing, the local processor 122 will enter a task path which will result in it halting and clearing the 

50 one-shot timer as set forth in Fig. 7A. Thus, the present invention will reset and start the one-shot timer upon 
completion of all pending operations. When the one-shot timer task completes, the TIMER task 680 itself is 
called. 

E. TIMER 

65 

Figure 8 is a flow diagram of the TIMER task 680. Operation begins at step 684 following the completion 
of the one-shot timer task wherein the iocal processor 122 calls the BUSY task 700 to determine IT the disk 
controller 1 12 is performing I/O operations. Upon return from the BUSY task 700. control transfers to step 686 
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the step of determining if the disk array controller has received, but not acted upon, an input or output oper- 
ation request 

The method of any of claims 1 to 3, wherein the step of determining whether the disk array is active further 
included the step of determin ing whether the disk on which the current disk memory location is located is 
currently being regenerated. 

The method of any of claims 1 to 4, further including the preliminary step of determining disk and disk array 
parameters and current status information. 

The method of any of claims 1 to 5, wherein the step of performing a read operation includes the step of 
generating a read request for the current disk me mory location and queuing the read request for execution 
by the disk array controller 

The method of claim 5, wherein the step of determining disk and disk array parameters further includes 
the step of reading disk and disk array parameter information from a reserved disk memory location of a 
disk. 

A computer system having a disk array subsystem (116), the computer system Including 

(a) means (20) for initializing a current disk memory location based upon disk array and disk drive status 
Information and current disk parameters; 

(b) means (20) for determining whether the disk array (1 16) is In the process of carrying out disk oper- 
ations, and if carrying out such operations, suspending operation of the method for a predetermined 
period of time and then repeating these steps until disk operations are not being carried out; 

(d) means (122) for performing a read operation on the current disk memory location: 

(e) means (122) for checking the results of the read operation to determine if the read operation has 
failed and upon detection of a failure, providing an Indication of disk media failure for the current disk 
memory location; 

(0 means (20) for incrementing to a successive disk memory location; and 

(g) means (20,122) for continually performing steps (b) through (e) for alf disk memory locations for all 
disks located within the disk array (1 16). 

A computer system according to claim 8, further Including means (122) for determining disk and disk array 
parameters and current statua information. 

A computer system according to claim 10, wherein the means (122) for determining disk and disk array 
parameters and current statua information comprises means for reading disk and disk array parameter 
information from a reserved disk memory location of a disk. 
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